Transcriptomics part 1
Eötvös Loránd University, Budapest & Biological Research Centre, Szeged
March 28, 2025
Introduction
RNA-seq data analysis
Transcriptome:
Types of different RNAs:
Low-throughput RNA profiling methods:
High-throughput RNA profiling methods
Extract expressed RNA, sequencing → fastq file
Pre-mapping quality checking, trimming (filtering)
Read mapping to reference genome OR de novo assembly of transcripts
Read counting
Quantitative Analyses: comparing expression levels
Functional enrichment analysis: GO, pathways…
Align reads to the genome
A standard DNA mapper will not map reads span on two exons, splice junctions.
GSNAP, STAR, TopHat2, Rsubread, etc.
be aware: errors may occur!
| Col | Field | Type | Brief description |
|---|---|---|---|
| 1 | QNAME | String | Query template NAME |
| 2 | FLAG | Int | bitwise FLAG |
| 3 | RNAME | String | References sequence NAME |
| 4 | POS | Int | 1- based leftmost mapping POSition |
| 5 | MAPQ | Int | MAPping Quality |
| 6 | CIGAR | String | CIGAR string |
| 7 | RNEXT | String | Ref. name of the mate/next read |
| 8 | PNEXT | Int | Position of the mate/next read |
| 9 | TLEN | Int | observed Template LENgth |
| 10 | SEQ | String | segment SEQuence |
| 11 | QUAL | String | ASCII of Phred-scaled base QUALity+33 |
| # | Decimal | Description of read |
|---|---|---|
| 1 | 1 | Read paired |
| 2 | 2 | Read mapped in proper pair |
| 3 | 4 | Read unmapped |
| 4 | 8 | Mate unmapped |
| 5 | 16 | Read reverse strand |
| 6 | 32 | Mate reverse strand |
| 7 | 64 | First in pair |
| 8 | 128 | Second in pair |
| 9 | 256 | Not primary alignment |
| 10 | 512 | Read fails platform/vendor quality checks |
| 11 | 1024 | Read is PCR or optical duplicate |
| 12 | 2048 | Supplementary alignment |
Find reads that map to coding sequence
Genome annotation: GTF (GFF, SAF, …) file: